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Abstract 


In educational testing, subscores may be provided based on a portion of the items from a larger 
test. One consideration in evaluation of such subscores is their ability to predict a criterion 
score. Two limitations on prediction exist. The first, which is well known, is that the coefficient 
of determination for linear prediction of the criterion score by the subscore cannot exceed the 
reliability coefficient of the subscore. The second limitation is on incremental validity. The 
coefficient of determination for linear prediction of the criterion score by both the total score and 
the subscore is at least as great as the coefficient of determination for linear prediction of the 
criterion score by only the total score. Incremental validity may be measured by the difference 
between these two coefficients of determination. This difference is no greater than the reliability 
of the residual from linear prediction of the subscore by the total score. 
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When subscores based on sections of a larger test are employed to provide more detailed 
information about an examinee than is available from a total test score, it is reasonable to ask 
to what extent the subscores can provide useful predictions of a criterion score. A more subtle 
question is whether the subscore can provide incremental validity given that the total score is 
already used to predict the criterion score. In fact, significant limitations exist on validity. These 
limitations depend on the reliability of the subscore, the reliability of the total score, and on the 
correlation of the true subscore and true total score. The limitations apply to any criterion score. 

To examine these limitations, some basic results from classical test theory are provided in 
section 1. In section 2, these result are provided to yield the desired limits. Some examples 
with operational data are provided in section 3. In section 4, conclusions are reached concerning 
practical implications. The notation and arguments used parallel those in Haberman (2008). 

1 Results From Classical Test Theory 

Let Sx be a random variable that represents an observed subscore of an examinee randomly 
selected from some population, and let Sz be a random variable that represents the observed total 
score for that randomly selected examinee. Let Sy = Sz — Sx, the observed remainder score, be 
the portion of the observed total score not ascribed to the observed subscore. Let Sy be a random 
variable that represents the observed value of an external criterion score for the examinee. This 
report considers linear prediction of Sy by Sx, linear prediction of Sy by Sz, and linear prediction 
of Sy by both Sx and Sz- As in classical test theory (Haberman, 2008; Holland & Hoskens, 2003), 
for the randomly selected examinee, consider a randomly selected test form from a collection of 
parallel test forms and a randomly selected validity measurement from a corresponding collection 
of parallel measurements of the validity criterion under consideration. Assume that selection of 
test form and criterion score are independent. Then Sx = t.y + ex, Sy = Ty + ey, Sz = tz + ez, 
and Sy = Ty + ey. Here the true subscore tx is the conditional expected value of Sx given 
the examinee, ex = Sx — Tx is the measurement error of the subscore, the true total score t z 
is the conditional expected value of Sz given the examinee, ez = Sz — tz is the measurement 
error of the total score, the true remainder score Ty is the conditional expected value of Sy given 
the examinee, ey = Sy — Ty is the measurement error of the remainder score, the true criterion 
score Ty is the conditional expected value of Sy given the examinee, and ey = Sy — Ty is the 
measurement error of the criterion score. Under the assumption that the observed measurements 
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Sx, Sz, and Sy all have finite and positive variances, the observed subscore Sx, the true subscore 
t x , the measurement error ex, the true total score tz, the error of measurement ez, the observed 
remainder score Sy, the true remainder score ry, the measurement error ey, the true criterion 
score ry, and the corresponding measurement error ey are all random variables with finite means 
and variances. The expectations satisfy the constraints that E(Sx) = E{tx), E(Sz) = E{jz), 
E(Sy) = E{ry ), E(Sy ) = E{ry ), and E(ex) = E(ez) = E{ey) = E(ey) = 0. The errors of 
measurement are all uncorrelated with the true scores. The error of measurement ex for the 
subscore and the error of measurement ey for the remainder score are uncorrelated, so that the 
covariance Cov(ex, ez) = Cov(ex, ex + ey) of the measurement errors ex and ey is the variance 
cr 2 (ex) of the error of measurement ex of the subscore (Haberman). 

To avoid trivial cases, it is assumed that the variance <r 2 (rx) of the true subscore tx, the 
variance <x 2 (ry) of the true remainder score, the variance cr 2 (ry) of the true total score Tz, the 
variance a 2 (ry) of the true criterion score, the variance <y 2 (ex) of the measurement error ex, the 
variance a 2 (ey) of the remainder score ey, and the variance cr 2 (ey) of the measurement error ey 
are all positive. The variance u 2 (ez) = cr 2 (ex) + c 2 (ey) of the measurement error ez, the variance 
(t 2 (Sx) = <t 2 (tx) + & 2 (ex) of the observed subscore Sx, the variance a 2 (Sy) = a 2 (ry) + cr 2 (ey) 
of the observed remainder score, the variance u 2 (Sz ) = cr 2 (ry) + a 2 (ez) of the observed total 
score Sz, and the variance (j 2 {Sy) = a 2 {ry) + a 2 (ey) of the observed criterion score Sy are all 
positive (Lord & Novick, 1968, p. 57). Thus the observed subscore Sx has reliability coefficient 
p 2 (Sx,Tx) = a 2 (jx) /o' 2 (Sx) equal to the square of the correlation p{Sx,Tx) of Sx and the true 
subscore tx, the observed remainder score Sy has reliability coefficient p 2 (Sy,Ty ) = a 2 (ry) / a(Sy) 
equal to the square of the correlation p(Sy,Ty) of Sy and the true remainder score ry, the 
observed total score Sz has reliability coefficient p 2 (Sz,Tz) = & 2 {tz) / cr 2 (Sz) equal to the square 
of the correlation p(Sz,Tz) of Sz, and the true total score ry, and the observed criterion score 
Sz has reliability coefficient p 2 (Sy,Ty ) = a 2 (ry) / a 2 (Sy) equal to the square of the correlation 
p(Sy,ry) of Sy and the true criterion score ry. All reliability coefficients are positive and less 
than 1 (Lord Sz Novick, p. 61). 
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2 Prediction of the Criterion Score 


If p(Sx,Sv) is the correlation of the observed subscore Sx and the observed criterion score 
Sy and if p{jx,Ty) is the correlation of the true subscore t\- and the true criterion score ry, then 

p{S x ,S v ) = p(t x ,t v )p(S v ,t v )p(Sx,t x ) (1) 

(Holland & Hoskens, 2003), so that 

|p(£x,SV)| < p(S v ,t v )p(Sx,t x ) (2) 

and the coefficient of determination p 2 (Sy\Sx ) = p 2 {Sx, Sy) for prediction of Sy by Sx satisfies 

p 2 (Sy\Sx) < P 2 {Sv,t v )p 2 (Sx,tx). 

The product a 2 (Sy)p 2 (Sy\Sx) is the mean-squared error achieved by linear prediction of the 
criterion score Sy by the observed subscore Sx- As is well known, (1) and (2) show that the 
ability to predict the validity criterion is constrained by the reliability coefficients of both the 
validity criterion and the subscore (Lord & Novick, 1968, p. 72). If the subscore has limited 
reliability, then prediction of the validity criterion cannot be very effective. Thus a subscore with 
a reliability of 0.25 cannot produce a coefficient of determination greater than 0.25, and the 
combination of a subscore with a reliability of 0.25 and a criterion score with a reliability of 0.25 
cannot yield a coefficient of determination greater than 0.0625. 

The incremental contribution of the observed subscore appears to have been less studied. To 
determine this contribution, one first considers linear prediction of the observed subscore Sx by 
the observed total score Sz- The best linear predictor 

L X -z = E(Sx) + Px-z[Sz ~ E(Sz)], 

where 

,, _Co y(S x ,Sz) 

PYX * 2 (Sz) 

is the regression coefficient for linear prediction of Sx by Sz- The prediction error is then the 
residual subscore 

Sx-z = Sx ~ L X -z- 

Because the observed residual subscore Sx z is uncorrelated with the observed total score Sz, the 
coefficient of determination p 2 (Sy\Sz, Sx) for prediction of the observed criterion score Sy by 
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both the observed subscore Sx and the observed total score Sz satisfies 

p 2 (Sv\Sz, Sx) = p 2 {Sv\Sz) + p 2 {Sv\Sx-z) 

(Lord & Novick, 1968, p. 266). A measure of the incremental validity of the subscore Sx is the 
difference 

p 2 (Sv\Sz, Sz) - p 2 {Sv\Sz) = p 2 {Sv\Sx-z) 

between the coefficient of determination for prediction of the observed criterion score Sy by both 
the observed subscore Sx and the observed total score Sz and the coefficient of determination for 
prediction of the observed criterion score Sy by the observed total score Sz- 

A bound on the coefficient of determination p 2 (Sy\Sxz) can be obtained by noting that the 
residual score Sx-z , which is a linear combination of the observed subscore Sx and the observed 
total score Sz, has a true residual score 

Tx-Z = Tx — E(Sx) - Px-z[tz - E{Sz)\, 


a measurement error 


e-x-z = ex - Px-zez, 


and a coefficient of reliability 


p 2 {Sx-z, tx-z) 


v 2 (t X -z) _ , _ er 2 (e X -z) 
c 2 (Sx-z) er 2 (Sx-z) 


Thus 


p 2 (Sy\Sx-z) < P 2 {Sy, Ty)p 2 {ry , rx-z)p 2 {Sx-z, tx-z) < p 2 {Sy,Ty)p 2 (Sx-z, tx-z)- 

The reliability p 2 {Sx-z,Tx-z) is readily determined from sample data (Haberman, 2008). The 
variance 

c 2 {Sx-z) = cr 2 (*S'x)[l - p 2 {Sx,Sz)\ 

and the regression coefficient fix-z are estimated as in standard regression analysis. Classical 
reliability estimation methods lead to estimates of the variances of measurement cr 2 (ex) and 
<r 2 (ey) = cr 2 (ez) — cr 2 (ex)- The decomposition ez = ex + ey leads to the formula 

ex-z = (1 — Px-z)ex — Px-zey. 
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Because the measurement errors ex and ey are uncorrelated, the variance a 2 (ex-z) of the error of 
measurement ex z satisfies 


<? 2 (ex-z) = (1 — (3x-z) 2 v 2 {ex) + Pxz^^y)- 

Thus cr 2 (ex-z ) is estimated by use of the estimates for / 3x z , cr 2 (ex), and cr 2 (ey). In turn, 
the estimate for cr 2 (ex-z) and the estimate for cr 2 (Sx-z) lead to an estimate for the reliability 
p 2 {Sx-Z,TX-z)- 

As is evident from the examples in section 3, in many typical cases, the reliability of the 
residual subscore Sx-z is low. The basic issue arises in the typical case in which the the true 
subscore Sx and the true remainder score ry have a positive correlation p{jx,Ty). Because 
the observed total score Sz is the sum Sx + Sy of the observed subscore Sx and the observed 
remainder score Sy, 

Co v(S x , S z ) = Co y(S x , Sy) + a 2 (S x ), 

Cov(Sy, S z ) = Co v(S x , Sy) + a 2 (Sy), 

and 

a 2 (S z ) = cj 2 (S x ) + v 2 (Sy) + 2Cov(S x ,Sy). 

Thus 

_ Cov(S x ,Sy) + a 2 (S x ) 

X ' Z a 2 (S x ) + <j 2 (S y ) + 2Co v(S x , Sy )' 

In like manner, if fiy.z is the regression coefficient for linear prediction of the observed remainder 
score Sy by the total score Sz, then 

_ Cov(Sy, Sz) _ Co v(Sx, Sy) + a 2 (Sy) 

lY ' Z <t 2 (Sz) a 2 (Sx) + <T 2 (Sy) + 2Cov(Sx,Sy)' 

As expected from the decomposition Sx + Sy = Sz, one has 

Px-z + fiy-z = 1. 

Because the true scores tx and Ty are uncorrelated with the measurement errors ex and ey and 
because the measurement errors ex and ey are uncorrelated, 

Cov(5x, Sy) = Cov(rx,ry) 
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(Lord & Novick, 1968, p.62). It follows that the observed subscore Sx and the observed remainder 
score Sy are positively correlated if the true subscore tx and true remainder score Ty are 
positively correlated. In this case of positive correlation, the regression coefficients Px-z and Py-z 
are both positive. This result has substantial impact when one considers the behavior of the 
variance components ct 2 (tx-z) and u 2 {exz) that determine the reliability of the residual subscore. 
The decompositions tz = tx + Ty and Pxyz + Py-z = 1 imply that the true residual subscore 

t~x z = Py-z[tx ~ E(Sx)] — Px-z[ty — ^(• S ' y )], 

so that the variance 

<t 2 (tx-z) = 0Y-z a2 ( T x) ~ 2 Py-zPx-zct{tx)(t(ty)p{tx,ty) + Px-z^^y)- 

The correlation assumption implies the inequality 

<t 2 (tx-z) < Py-z^^x) + Px-z a2 ( r Y)- 

The previous formula for cr 2 (ex-z) can also be written as 

v 2 {ex-z) = Py.z a2 ( e x) + Px-z a2 ( e Y), 

and the variance 

& 2 {Sxz) = Py-z^^x-z) + Px-z a2 ( e xz)- 

The reliability formulas ct 2 (tx) = cr 2 (Sx)p 2 {Sx, tx) and <7 2 (ry) = a 2 (Sy)p 2 (Sy, Ty) and the 
decompositions a 2 (Sx) = <t 2 (tx) + <r 2 (ex) and cr 2 (Sy) = u 2 (ry) + <r 2 (ey) imply that 

& 2 (tx-z) = Py-z (j 2 {Sx)p 2 {Sx,tx) 

+Px-Z a2 (Sy)p 2 (SY, Ty) 

—Py-zPx-Z&{Sx)o{Sy)p{Sx, Tx)p(Sy, Ty)p(rx, Ty) 

and 

e 2 {Sx-z) = PY-z a2 (Sx) + Px.z a2 (Sy) 

~Py-zPx-zo'(Sx)^(Sy)p(Sx,tx)p(Sy, ry)p(Tx,Ty). 

Thus the reliability coefficient p 2 (Sx-z,Tx-z) = & 2 (jx-z)/P 2 (Sx-z) is less than the weighted 
average 

Py.z a2 ( s x)p 2 {Sx,Tx) + Px. z ct 2 (Sy)p 2 (Sy,ty) 

P 2 . z a 2 (Sx)+P 2 x.z° 2 (SY ) 
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Obviously the reliability of the residual subscore Sx-z will be low unless the subscore Sx or 
remainder score Sy has high reliability; however, even if Sx and Sy have high reliability, 
the reliability of S X -z is low if the correlation t(t x ,ty) is high, for the negative term 
—Py-zPx-zO'(Sx)o'(Sy)p(Sx 1 tx)o'(ty)p(tx,ty) then results in a substantial reduction in the 
reliability of S X -z- 

Some consideration of limits may help clarify results. For fixed p(rx,Ty) < 1, a 2 (S x ), and 
cr 2 (Sy), let the reliability coefficients p 2 (S x ,T X ) and p 2 (Sy,ty) both approach 1. Then the 
reliability coefficients of both the subscore and total score approach 1, the reliability of the residual 
Sx-z approaches 1, the coefficient of determination p 2 (Sy\Sz ) converges to p 2 (Sy, Ty)p 2 (ry, tz), 
and p 2 (Sy\Sz, Sx) converges to 

P 2 (Sv\tz,t x ) = p 2 {Sy,Ty){p 2 (ry,Tx) + [1 - p 2 (r x , Tz)]p 2 (ry, t x \tz)}, 

where p(ry, t x \tz) is the partial correlation of the true scores t x and ry given the true 
score r z • Thus the incremental validity measure p 2 (Sy\Sx-z) has a limit no greater than 
p 2 (Sy,Ty)[ 1 — p 2 (t x ,tz)], so that high correlation of the true total score and the true subscore 
limits the incremental validity even when the reliability coefficients are high for both the subscore 
and the total score. 


3 Examples 

To illustrate results, the analysis in this section may be applied to the examples considered 
in Haberman (2008). In the first example (Tables 1, 2, and 3), subscores were examined from 
an SAT® I administration from 2002. In these tables, the SAT verbal examination is divided 
into the sections Verbal I, Verbal II, and Verbal III, while the SAT math examination is divided 
into the sections Math I, Math II, and Math III. Alternatively, the SAT verbal has sections for 
critical reading (CR), analogies (A), and sentence completion (SC), while the SAT math has 
sections for four-choice math multiple-choice (Math 4c), five-choice multiple choice (Math 5c), 
and student-produced math responses (Math S). Note that the SAT I examination of 2002 is 

TM 

substantially different from the current SAT Reasoning Test , and reporting of these scores was 
confined to reports of raw scores on CR, A, and SC to examinees but not to institutions. 

In each subscore in each table, the estimated reliability coefficient is sufficient so that 
relatively limited restriction on validity results is imposed. The smallest estimated reliability 
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Table 1 

Estimated Reliability Coefficients of Subscores 
and Residual Subscores for SAT Verbal 


Subscore 

Subscore reliability 

Residual subscore reliability 

Verbal I 

0.84 

0.11 

Verbal II 

0.80 

0.02 

Verbal III 

0.72 

0.17 

CR 

0.84 

0.24 

A 

0.74 

0.16 

SC 

0.78 

0.15 


Note. CR = critical reading, A = analogies, SC = sentence completion. 


Table 2 

Estimated Reliability Coefficients of Subscores and Residual Subscores for SAT Math 


Subscore 

Subscore reliability 

Residual subscore reliability 

Math I 

0.87 

0.08 

Math II 

0.83 

0.10 

Math III 

0.64 

0.08 

Math 4c 

0.72 

0.08 

Math 5c 

0.89 

0.06 

Math S 

0.73 

0.12 


Note. Math 4c = four-choice math multiple-choice, Math 5c = five-choice math 
multiple-choice, Math S = student-produced math responses. 


Table 3 

Estimated Reliability Coefficients of Subscores and Residual Subscores for SAT Total 


Subscore 

Subscore reliability 

Residual subscore reliability 

Verbal 

0.91 

0.72 

Math 

0.92 

0.72 



coefficient is for Math III as a subscore of math. The coefficient of 0.64 for this case just implies 
that 0.64 is at least as large as the coefficient of determination for prediction of the criterion 
score by the Math III score. On the other hand, far stronger restrictions on incremental validity 
are present when subscores of verbal or math are considered. The extreme case is Verbal II, 
where the estimated reliability of the residual of 0.02 severely restricts incremental validity. The 
coefficient of determination for prediction of the criterion score by the Verbal II and the total 
verbal scores cannot by more than 0.02 greater than the coefficient of determination for prediction 
of the criterion score by the total verbal score. It is notable that no math subscore offers much 
possibility in terms of incremental validity, for the maximum reliability of a residual subscore 
is 0.12, a value achieved for Math S, the student-produced responses. Even here, the potential 
incremental improvement in prediction of a criterion score is quite limited. The coefficient of 
determination from use of the total math score to predict the criterion score cannot be more than 
0.12 less than the coefficient of determination from prediction of the criterion score by both the 
total math score and the Math S subscore. In practice, further limits can be expected because the 
reliability of the criterion score is typically someone less than 1 and the correlation of the true 
residual score and the criterion score may well be somewhat less than 1. 

The highest possibility among the subscores of the verbal and math tests is the critical 
reasoning portion of the verbal test, where the upper bound for incremental validity is 0.24. More 
generally, analogies, sentence completion, and Verbal III from the verbal test have higher potential 
for incremental validity than any subscores from the math test. 

The situation is quite different when verbal and math are considered to be subscores of a 
total SAT score. Both subscores have quite high reliability, and the restriction on incremental 
validity is of little consequence, for the reliability of the residual subscores is 0.72. 

For a second example, consider the Praxis examination results in Haberman (2008). 

Here subscores are present for English language arts (E), mathematics (M), citizenship and 
social science (C), and science (S). Results are summarized in Table 4. The subscore reliability 
coefficients, all of which are at least 0.68, provide only modest restrictions on the correlation of a 
validity criterion with the subscore. The restriction on incremental validity is somewhat less for 
English language arts and mathematics than for citizenship and social science and for science, 
for the residual subscore reliability coefficients have estimates of 0.25 and 0.29 for science and 
for citizenship and social science, respectively, while the corresponding reliability coefficients 
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Table 4 

Estimated Reliability Coefficients of Subscores 
and Residual Subscores for Praxis Data 


Subscore 

Subscore reliability 

Residual subscore reliability 

E 

0.73 

0.43 

M 

0.79 

0.48 

C 

0.68 

0.29 

s 

0.69 

0.25 


Note. C = citizenship and social science, E = English language arts, 
M = mathematics, S = science. 


for English language arts and mathematics are 0.43 and 0.49, respectively. Thus at least some 
possibility for appreciable incremental validity exists for all subscores. 

4 Conclusions 

In terms of validity, results in this report indicate that subscores have limited potential 
value unless they have some level of reliability and unless the true subscores are not very highly 
correlated with the true total score. Adequate reliability of the subscore is required for any 
possible validity result. The subscore cannot be highly correlated with a criterion score unless the 
subscore has a high reliability coefficient. 

Even for a subscore with high reliability, potential validity results are limited when the 
correlation of the true subscore tx and the true remainder score ry is quite high. This situation 
corresponds to a high correlation of the true subscore tx and the true total score tz = t\ + ry. 

The limits on validity can have modest impact even if the correlation p(tx,tz) indicates a 
strong relationship between the true scores tx and tz- This situation holds for the SAT math 
and verbal scales in Table 3, for the estimated value of p(tx,tz ) exceeds 0.9 for these cases 
(Haberman, 2008). 

In practice, limits on incremental validity should be examined to verify that subscores have 
any realistic possibility of usefulness. To be sure, some potential for utility does not ensure actual 
usefulness. On the other hand, negligible potential can eliminate any need for further study. 
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